Most recent approaches to monocular 3D pose estimation rely on Deep Learning.They either train a Convolutional Neural Network to directly regress from imageto 3D pose, which ignores the dependencies between human joints, or model thesedependencies via a max-margin structured learning framework, which involves ahigh computational cost at inference time. In this paper, we introduce a Deep Learning regression architecture forstructured prediction of 3D human pose from monocular images that relies on anovercomplete auto-encoder to learn a high-dimensional latent poserepresentation and account for joint dependencies. We demonstrate that ourapproach outperforms state-of-the-art ones both in terms of structurepreservation and prediction accuracy.
展开▼